-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
ENH: maybe_convert_objects seen NaT speed-up #27300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: maybe_convert_objects seen NaT speed-up #27300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an asv that hits this case
Will add. Should I add 'what's new' entry as well? Also - in which file should I add asv - algorithms.py? |
yes that would be great; 0.25.0 performance section. |
|
doc/source/whatsnew/v0.25.0.rst
Outdated
@@ -939,7 +939,7 @@ Performance improvements | |||
- Improved performance by removing the need for a garbage collect when checking for ``SettingWithCopyWarning`` (:issue:`27031`) | |||
- For :meth:`to_datetime` changed default value of cache parameter to ``True`` (:issue:`26043`) | |||
- Improved performance of :class:`DatetimeIndex` and :class:`PeriodIndex` slicing given non-unique, monotonic data (:issue:`27136`). | |||
|
|||
- Improved performance of :meth:`pandas._libs.lib.maybe_convert_objects` for the case when input contains ``NaT``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what hits this from a user perspective? This is a private method which we wouldn’t mention in a what’snew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What should I write instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on what would touch this from user code. I see DataFrame.from_tuples
and maybe GroupBy ops with datetimelike objects in the result - do you see a difference when using either of those?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
best way is to run the entire asv suite (takes an hour or so)
and see what changes
I believe imports should be in this order: import pandas as pd
from pandas._libs import lib
from pandas.util import testing as tm But when I run |
Also maybe add |
Full asv
returns:
So there are (somehow) downsides of it (or I am using Btw full |
Yea I think there is some noise in there - do some of the regressions even hit this code? |
ok so likely is this path is not hit in our asvs so remove the whatsnew note and looks good |
@@ -13,6 +15,19 @@ | |||
pass | |||
|
|||
|
|||
class MaybeConvertObjects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we even need this benchmark since it doesn't indicate anything from end user experience but up to @jreback
Break alone in PR lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to add more tests here later as well - for more generalized cases.
thanks @BeforeFlight followups welcome. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff